Attention Is All You Need
https://arxiv.org/abs/1706.03762
Attentionを独自拡張し、モデルアーキテクチャTransformerを提案
Transformerが隆盛している(例:(積ん読)A Survey of Transformers (2021))
Self-attention
https://github.com/tensorflow/tensor2tensor#walkthrough
tensor2tensor
翻訳はsequence transduction taskの1つ
Abstract
We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.
「新しい単純なネットワーク構造、Transformerを提案する」
「Transformerは単にattention機構だけに基づき、recurrent networkやconvolution networkを完全になしで済ませる」
機械翻訳タスク
性能で上回った
parallelizable(ref: 8.5.2 Transformer)
7.Conclusion
In this work, we presented the Transformer, the first sequence transduction model based entirely on attention, replacing the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention.
For translation tasks, the Transformer can be trained significantly faster than architectures based on recurrent or convolutional layers.
「翻訳タスクにおいて、リカレントレイヤーや畳み込み層に基づくアーキテクチャよりもTransformerは有意に早く訓練できる」
3 Model Architecture
4 Why Self-Attention
5 Training (Attention Is All You Need)
TODO Attention Visualization (Appendix) Figure 3